10. Different types of Spark Functions

Transformations and Actions

There are two types of functions in Spark:

  1. Transformations
  2. Actions

Spark uses lazy evaluation to evaluate RDD and dataframe. Lazy evaluation means the code is not executed until it is needed. The _ action_ functions trigger the lazily evaluated functions.

For example,

df = spark.read.load("some csv file")
df1 = df.select("some column").filter("some condition")
df1.write("to path")
  • In this code, select and filter are transformation functions, and write is an action function.
  • If you execute this code line by line, the second line will be loaded, but you will not see the function being executed in your Spark UI.
  • When you actually execute using action write, then you will see your Spark program being executed:
    • select --> filter --> write chained in Spark UI
    • but you will only see Writeshow up under your tasks.

This is significant because you can chain your RDD or dataframe as much as you want, but it might not do anything until you actually trigger with some action words. And if you have lengthy transformations, then it might take your executors quite some time to complete all the tasks.